Multi-modal humor segment prediction in video

نویسندگان

چکیده

Abstract Humor can be induced by various signals in the visual, linguistic, and vocal modalities emitted humans. Finding humor videos is an interesting but challenging task for intelligent system. Previous methods predict sentence level given some text (e.g., speech transcript), sometimes together with other modalities, such as speech. Such ignore caused visual modality their design, since prediction made a sentence. In this work, we first give new annotations to based on sitcom setting up temporal segments of ground truth derived from laughter track. Then, propose method find these humor. We adopt approach sliding window, where described pose facial features along linguistic subtitles each window. use long short-term memory networks encode dependency poses pre-trained BERT handle subtitles. Experimental results show that our improves performance prediction.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-modal Target Prediction

Users with severe motor impairment often depends on alternative input devices like eye-gaze or head movement trackers to access computers. However these devices are not as fast as computer mouse and often turn difficult to use. We have proposed a Neural-network based model that can predict pointing target by analyzing pointing trajectory. We have validated the model for standard computer mouse,...

متن کامل

Link Prediction in Multi-modal Social Networks

Online social networks like Facebook recommend new friends to users based on an explicit social network that users build by adding each other as friends. The majority of earlier work in link prediction infers new interactions between users by mainly focusing on a single network type. However, users also form several implicit social networks through their daily interactions like commenting on pe...

متن کامل

Multi-Modal Tracking for Video Compression

This paper describes a system which uses multiple visual processes to detect and track faces for video compression and transmission. The system is based on an architecture in which a supervisor selects and activates visual processes in cyclic manner. Control of visual processes is made possible by a confidence factor which accompanies each observation. Fusion of results into a unified estimatio...

متن کامل

Multi-modal Aggregation for Video Classification

In this paper, we present a solution to Large-Scale Video Classification Challenge (LSVC2017) [1] that ranked the 1st place. We focused on a variety of modalities that cover visual, motion and audio. Also, we visualized the aggregation process to better understand how each modality takes effect. Among the extracted modalities, we found Temporal-Spatial features calculated by 3D convolution quit...

متن کامل

Multi-Modal Tracking for Video Compression1

This paper describes a system which uses multiple visual processes to detect and track faces for video compression and transmission. The system is based on an architecture in which a supervisor selects and activates visual processes in cyclic manner. Control of visual processes is made possible by a confidence factor which accompanies each observation. Fusion of results into a unified estimatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Multimedia Systems

سال: 2023

ISSN: ['1432-1882', '0942-4962']

DOI: https://doi.org/10.1007/s00530-023-01105-x